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Abstract 

Understanding the mechanisms underlying the observed dynamics of complex biological systems requires the statistical 
assessment and comparison of multiple alternative models. Although this has traditionally been done using maximum 
likelihood-based methods such as Akaike's Information Criterion (AlC), Bayesian methods have gained in popularity because 
they provide more informative output in the form of posterior probability distributions. However, comparison between 
multiple models in a Bayesian framework is made difficult by the computational cost of numerical integration over large 
parameter spaces. A new, efficient method for the computation of posterior probabilities has recently been proposed and 
applied to complex problems from the physical sciences. Here we demonstrate how nested sampling can be used for 
inference and model comparison in biological sciences. We present a reanalysis of data from experimental infection of mice 
with Salmonella enterica showing the distribution of bacteria in liver cells. In addition to confirming the main finding of the 
original analysis, which relied on AlC, our approach provides: (a) integration across the parameter space, (b) estimation of 
the posterior parameter distributions (with visualisations of parameter correlations), and (c) estimation of the posterior 
predictive distributions for goodness-of-fit assessments of the models. The goodness-of-fit results suggest that alternative 
mechanistic models and a relaxation of the quasi-stationary assumption should be considered. 



Citation: Dybowsl<i R, IVlcKinley TJ, IVlastroeni P, Restif O (2013) Nested Sampling for Bayesian IVlodel Comparison in the Context of Salmonella Disease 
Dynamics. PLoS ONE 8(12): e82317. doi:10.1371/journal.pone.0082317 

Editor: Andrew J. Yates, Albert Einstein College of Medicine, United States of America 

Received August 15, 2013; Accepted October 22, 2013; Published December 20, 2013 

Copyright: © 2013 Dybowski et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits 
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. 

Funding: RD was funded by the Biotechnology and Biological Sciences Research Council (BBSRC) (grant number BB/1002189/1). TJiVl was funded by the 
Biotechnology and Biological Sciences Research Council (BBSRC) (grant number BB/1012192/1). OR was funded by the Royal Society. The funders had no role in 
study design, data collection and analysis, decision to publish, or preparation of the manuscript. 

Competing interests: The authors have declared that no competing interests exist. 

* E-mail: rd460@cam.ac.uk 



introduction 

IVlodel comparison 

Model-based inference is widely used in life sciences in order to 
assess the plausibility of hypothesised biological mechanisms based 
on data from observations or experiments. One of the most 
common approaches to compare competing models representing 
alternative hypotheses relies on Akaike's Information Criterion 
(AIC) [1]. For a given data set V, the plausibility of the candidate 
models M, is assessed by calculating their respective AIC values, 
AlCr. 



AICi= -2\np(V\0MLE.iMi) + 2n. 



df,r 



(1) 



In (1), 0MLE,i is the maximum likelihood estimate of the set 
parameters associated with model Mj, and n^f is the correspond- 
ing number of degrees of freedom. If AIC\ < AIC2 then M \ is 
more plausible than A^2, with respect to 23, in the sense that the 
KuUback-Liebler divergence of M \ from the true model is smaller 
[2]. 

An important drawback to the classic approach to model choice 
is that it is based on a single point estimate Omlej of 0, the 
uncertainty in 0 being ignored. In contrast, the Bayesian approach 
considers a probability distribution for 0, with /)(0(,)|X>,A^,) 
expressing the uncertainty in given V (for a model M{i)). 



Suppose that we wish to select a model from a set of candidate 
models {M\, . . . ,Mm} given our observation of data V. We can 
express this goal probabilistically by stating that the aim is to 
determine the most probable model: argmaxx, p{Mi\V). 

From Bayes' theorem, we have 



p(Mi)p(V\Md 
Y:jLiPiMMV\Mjy 



(2) 



therefore, \Sp(Mi) is known, or considered to be equal for all Mi 
then the focus is on the model evidence p(V\Mj). 

If d(i) is the set of parameters associated with model M.i, the 
Bayesian approach to p{V\M(i)) is to integrate over all possible 
values of : 



p{V\M(,))-- 



p(D\0fi)Mi)p(0(o\Md d0(,y 



(3) 



In addition to allowing for parameter uncertainty, (3) intrinsi- 
cally penalizes against models that are better able to fit to observed 
data because of their complexity [3], thereby removing the need 
for an explicit complexity penalization term. 

The integral of (3) can be estimated analytically or numerically. 
In analytical approaches, the integral is approximated by the 
adoption of simplifying assumptions; for example, as used for 
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derivation of the Bayes Information Criterion [4]. Numerical 
approaches are based on some form of Monte Carlo sampling 
such as Gibbs Sampling [5], 

One approach to estimating the integral 



£(«) it(9) 

[ piV\0,M)/tp(O\M)/td0 

Jq 

numerically is to sample 0 randomly from its prior, 

f £(0)71(0) d0K^Y, -CCf «:),where 0k ~n; 



(4) 



however, the prior n(0) is often concentrated in places where the 
likelihood C(0) is relatively low. This problem becomes more 
severe in high-dimensional parameter (0) spaces, or in problems 
where the likelihood function £(0) is concentrated in a very small 
region. 

To overcome the problem, SkUling [6,7] proposed a means of 
estimating C(0)n(0)d0 that, by design, samples 0 sparsely from 
the 0 space where the likelihood C(0) is low, and densely where 
C(0) is high, by means of 'nested sampling', which is the focus of 
this paper. A recent addition to the Bayesian arsenal, nested 
sampling has been used in cosmology to compare alternative 
models of the universe against observed data [8]. Outside of 
physics, it has, so far, received little attention [9,10]. 

Within-host dynamics of a bacterial infection 

Quantitative research on infectious disease dynamics has 
undergone rapid development over the last two decades, 
motivated by concerns about emerging infections that can spread 
globally and about the evolution of pathogens resistant to existing 
control measures such as antimicrobials and vaccines. Bayesian 
computation has become the method of choice to fit stochastic 
dynamic models to epidemiological [1 1] or experimental datasets 
[12]. This is in large part due to the appeal of being able to 
produce measures of uncertainty and correlation for the model 
parameters based on their posterior probability distributions. 
Similarly, models for within-host dynamics of infection have more 
recently started to benefit from Bayesian inference approaches 
[13]. 

Salmonella enterica causes systemic diseases (typhoid and paraty- 
phoid fever) [14], food-borne gastroenteritis and non-typhoidal 
septicaemia (NTS) [15] in humans and in many other animal 
species world-wide, which also cause a very serious problem for 
the food industry. The global burden of typhoid fever is estimated 
at ca. 22 million cases with a mortality estimated at ca. 200,000 
deaths per year [14,16]. Paratyphoid has an estimated 5.4 million 
illnesses worldwide [16]. The high incidence of these diseases, that 
affect both travellers to and residents in endemic areas, and 
threaten infants, children and immunodeficient patients, dictates 
the urgent need for more efficacious preventive and therapeutic 
measures. 

In the mouse model of systemic infection. Salmonella reside and 
proliferate mainly within phagocytic cells of the spleen liver, bone 
marrow and lymph nodes [17—19]. ObscT\'ation of Salmonella by 
fluorescence microscopy in the tissues of mice has revealed that a 
key feature of systemic infections with wild type bacteria is the 
presence, on average, of low bacterial numbers within individual 
phagocytes irrespective of net bacterial growth rate and time since 
infection [20-23]. 



In an effort to understand the dynamics that underpin the 
intracellular numerical distributions of Salmonella within the host 
cells, and to capture the essential traits of the cell-to-cell spread of 

the bacteria, we have used mathematical model frameworks for 
the intensity of intracellular infection that links the quasi-stationary 
distribution of bacteria to bacterial and cellular demography. An 
example of this the work done by Brown et al. [24], who compared 
the observed distribution {C„}, where C„ is the number of cells 
with n bacteria, across 16 candidate infection models. The models 
under consideration were as follows: (a) one homogeneous model, 
in which, for every cell, burst occurred only when the number of 
bacteria « in a cell reached a single burst threshold Afmaxl (b) five 
heterogeneous models having a probability distribution of burst 
thresholds; and (c) eight stochastic models for which there is a 
probability that a given cell will undergo burst. Two datasets were 
analysed, one for a virulent strain of bacteria and the other for an 
attenuated strain. Brown et al. [24] computed the maximum 
likelihood estimates of the parameters of each model, and selected 
the 'best' model based on the corresponding AIC values. 

In order to o\'ercome the issues raised by AIC discussed above, 
we decided to re-analyse the datasets and re-assess the models 
within a Bayesian framework. 

Methods 

What follows is an elaboration of the description of nested 

sampling given by Skilling [6,7]. 

Nested sampling 

The expected value of a function g of a random variable X is 
given by 



Ete(X)) = 



g(x)/x(x) dx 



where /X is the pdf of X. On comparing this expression with the 
target integral C(0)n(0)d0, it is clear that 



C(0)n(0)d0=E(C(0))- 



(5) 



that is to say, the expected value of the likelihood under the prior. 
The cumulative distribution function Fx(x) with respect to a 
random variable X is defined by 



Fx(x)=p(X<x)-- 



fxiy) dy 



and is related to the expectation E(A^ by 



TOO 

E(X)= (\-Fx(x))dx, 

JO 

[25]; consequentiy, from (5), we obtain the important relationship 



f poo pi 

C(0)n(0)d0= (l-Fc(X))dX= (l-Fc(X))dX, (6) 

Jq Jo Jo 

where X is likelihood, and C in the right-hand integral is equal to 
C(0). The reason why (6) is important is that the multivariate 
integral on the left-hand side has been equated to a univariate 
integral. 
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Since 0 has a distribution defined by prior n, and £, = £(0), it 
follows that C has a probabihty distribution and thus a cumulative 
distribution function. 



FcW=piC<X)= fc(y)dy. 



Jo 



(7) 



which is present in the integrand of the right-hand integral of (6). 

We can replace (1 —Fjc(Xy) dX in (6) with a more accessible 
integral by the following steps. First, since the pdf of C is 
connected to the pdf of 0 via C{0), we can write 



tfcb')dy=\ n(0)d0; 
Jo Se-.cwxx 



m< 

thus, from (6), (7) and (8), we can write 

£{0)K(0)d0= 1- 
J/l = 0 V 

71(0) d0 dl. 

B:CX«)>1 



(8) 



e:C(e)<x 



n(0) do dX 



(9) 



It will be convenient to rewrite the inner integral of (9) as ^{X) to 
give 



C{0)n{0)d0 = 



(j>(X) dX, 



(10) 



where <I>(X) is the probability of selecting 0 from the prior n such 
that C{0)> X: 



^{X)=\ n(0)d0. 
}e:C(f)>x 



(11) 



The above sequence of determining 0„i„ and the corresponding 
^ is performed on the new set of points, giving rise to Xmm,! and (^2- 
Point 0mm,2 is replaced by a 0 drawn from the new restricted prior 
Ti\{C(0)> A„,i„2)- In other words, 0 is sampled from 
Q.2 = {0\C{0) > ^,„,2}, for which ilz C ill . 

This cycle is repeated until some stopping criterion has been 
reached. If this termination occurs at the J-th iteration then the 
resulting values of Xmm,i and will be 



ij<ij-i<--- <ii, 

and the resulting sequence of £1 subsets is 



hence the term nested sampling. 

Model evidence Z= C(0)n(0)d0 can be estimated from the 
recorded Xmi„^i and (^,- values by means of the approximation 



Z-- 



(13) 



where / is the number of iterations used, and AZ is a vertical 
rectangular segment under the curve of Figure 1 . 

Algorithm 1 (Table 1) describes the above process in 
pseudocode. 

Practical adjustments to the algorithm 

We will now consider how some of the aspects of Algorithm 1 
can be implemented. 

Segment AZi used in (13) could be evaluated by the trapezoidal 
approach 



Introducing i = (j){X), hence X = <j) {^), we can rewrite the 
previous integral as 

\ C(0)n(0)d0=( r\Odi, (12) 

Jq Jo 

where ^"'(0 is that likelihood X such that p{£{0)>X) = ^ (cf 
Equation (11)); for example, if 0"'(O.9) = O.OO42 tiien 90% of 0 
drawn from the prior n(0) will have likelihoods greater than 
0.0042. 



but Sivia and SkiUing [26] have found 

AZi = Xminji^i- 1 ~ id 

to be adequate (line 9 in Algorithm 1). 

Line 7 in Algorithm 1 used the assignment ii*-P(C(0)> Xminj), 
but an alternative approach is to replace this assignment with 
<^,<-E[i^,]. An approximation of E[<^] is derived as follows. Let t,- 
denote the ratio with ^0 = 1. At the kth iteration we have 



The algorithm 

The main steps of the nested sampling technique are as follows. 
First, n points 0 (i.e., parameter vectors) are sampled from the 
prior n, and their corresponding likelihoods C(0) determined. The 

point 0mm,\ having the smallest likelihood is determined and its 
likelihood /tmi«,i is recorded. Furthermore, the probability Cj that 
C(0)>X 

nun,\ is also recorded. Point OminA is replaced by a new 6 
drawn from the prior n but restricted to those 0 for which 
£,{0)> Xmin,\- In other words, a restricted prior is used: 
n\(C(0)> Xmin,l)- If £i is the set of all possible 0 then the set 
ill = {0\£'i0)> Xmm,\} is a subset of il. 



and so 
therefore, 

E[Iog4]=EiiE[logT,-]. (14) 

Now, 
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Figure 1. The shaded area below the curve for <^ '(c) is equal to C(0)n(0) dO. See Equation (12). 
doi:10.1 371/journal.pone.008231 7.g001 



E[logT,l 



log T p{x) dx 



[T"l0gTi;- 



n 



logT m" ' dz 



[27] 



therefore, from (14), 



Table 1. Algorithm 1: The nested sampling algorithm. 



Input: (a) likelihood function C{9); (b) prior n{d)] (c) number n of active parameter vectors in use during nested sampling. 
Out put: an estimate Z of Z. 

1 : Let S be a set of n parameter vectors 0| , . . . ,d„ n 
2: Z^O 
3: I 

4;whlle terminating condition not satisfied do 

5: 0mm,i ^ arg min C{9) 

6: A,„i„,i^C(g,„i„j) 

7: c,<-P(£(e) >/„„„,,) 

8: if i > 1 then 

9: AZi<-A„i„j{£i_i-^i) l>Estimated segment of Z 

10: Z^Z + AZi 

1 1 : e„„ ~ n\(C{0) > ;.„„„.,) >Restricted prior 

12: S^S-{0,„,i,.,} 

13: S^SUR™} 
return Z 



doi:l 0.1 371/journa!.pone.008231 7.t001 
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Table 2. Algorithm 2: An implementation of Algorithm 1 in which practical adjustments are included. 



Input (a) likelihood function £{&); (b) prior n(0); (c) number n of active parameter vectors in use during nested sampling; (d) procedure for determining a region 7^(S) of 
parameter space that encloses a set of parameter vectors S; (e) fraction / of Z to be estimated. 

Output: an estimate Z of Z. 

1 : Let S be a set of n parameter vectors 6[, . . . fi^ n 

2: Zi-O 

3: 

4:Repeat 



5: 


ffminj arg min C{9) 




6: K,i„j^C(e,„i„j) 


7: 


C,^P(£(fl)>A„„„,,) 




8: 


if ; > 1 then 




9: 




OEstimated segment of Z 


10: 


Z^Z + AZi 




11: 


o„„ ~ 7i\ieen(s) AC(8)> x„,i„j) 


>Restricted prior 


12: 


S^S-{0„„i,.,} 




13: 






14: 


until: A„„„ ,Qi <fZ 


>The stopping condition 



return Z 



doi:1 0.1 371/journa!.pone.008231 7.t002 



E[logi 



E[4]«exp 



Since the logarithm function is strictly increasing and concave, 
we have, from Jensen's inequality, that 



and thus 



logE[c,l<E[log,y 



E[^,,]<exp 



however, Sivia and SkiUing [26, p. 186] drop the inequality and 
use the approximation 



As regards the termination of Algorithm 1, there is no rigorous 
criterion as to when the algorithm should be stopped, but SkiUing 
[7] and Feroz and Hobson [28] have found 

to be an effective stopping condition, where / is the fraction of Z 
that will not significantly contribute to the estimate of .E (according 
to a user-defined value). 

Chopin and Robert [29] have shown that the asymptotic 
variance of the nested sampling approximation typically grows 
linearly with parameter dimensions. 



Table 3. Probability distributions d(U\0) for the burst thresholds M. 



Model 



Distribution 



Parameters, ff 



d(M\0) = r8n^i +(1 -r)V,Af2 
d(J\r\e) = ( W/Ar!)exp( - X) 

dm)=['^+j^-')p^ii-P)' 

d(M\e)=p(\-pf-' 



AA„„„e{2, . . . ,40} 

AA,e{2, . . . ,5}, J\riB{2, . . . ,30}, re|0,l| 
;.e(0,40] 

re{30,...,45},pe|0,l] 
re{l,...,30},p6(0,l) 
;,e(0,ll 



(1) Unimodal Kronecker, (2) bimodal Kronecker, (3) Poisson, (4) binomial, (5} negative binomial, and (6) geometric. 
doi:l 0.1 371/journa!.pone.008231 7.t003 
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Table 4. Parameters used for the eight stochastic models 
based on (15) and (16). 



Parameters, & 



Model 


/'o 


Ai 








7 


/'o 


0 


0 




0 


8 


0 


A'l 


0 




0 


9 


0 


0 


l-'o 




0 


10 


1^0 


^'^ 


A'2 




0 


11 


Aio 


0 


0 




«(, 


12 


0 




0 






13 


0 


0 


^'2 






14 


lk> 




A'2 







For each model, some of the parameters were set equal to constant values, 
which effectively removed the parameters from the model. The range of values 
considered were /^o,/f|,/f2e[0,2] and a,,G[0.1]. 
doi:l 0.1 371/journal.pone.008231 7.t004 



Finally, there is the structure of the restricted priors. Each new 
point 0„ew for a set S of active points is sampled from prior n 
conditioned on the restriction that C{0„e\v)> ^•■min- Rather than 
searching across the entire 0 -space for such a point, it is more 
computationally efficient to restrict the search to a region 7?.(S) 
that contains S. We have used rectangular cuboids for TZ(S). 

Incorporating the above points into Algorithm 1 leads to 
Algorithm 2 (Table 2). Before applying the algorithm to our 
experimental datasets, we tested it on a simple two-parameter 
likelihood function C{a,li) = ir'°(\-!if^li^^(\-Pf°. The analyses 
and results are presented in Methods SI, 

The Salmonella models 

Evidence p(V\M) was estimated by nested sampling with 
respect to two groups of models associated with within-host S. 
enterica infection, were each model M provides an expression for 
the probability q(n\0,M) that a host cell contains n bacteria. 

In the first group of models, infected cells are assumed to burst 
when the number of bacteria they contain reach a fixed threshold 

Table 5. Algorithm 3: Estimation of q(n\0,M) using an 
iterative estimation of the infection rate constant y. 



Input: parameters 0 for model M. 

Output: an estimate of probabilities q(\\8,M),q(2\e,M) ,q{n,„„\e,M). 

I:y<-1 Initial value for )' 

2: )'„/,/ -10-/ 

3: while: |(r-lw)/r„/,/l >0.01 do 
5:/-^[l/2/z/3,.../2.../_J 

where/, = (K„_i(n— 1 ))/()' + + ^Q,^ii,/i2)) >Equation (21) 

6: ?i^(E™"T^'H)"' >Estimate of f/(I|6»,7W) 

7: P^q\F ^Estimates of q{n]0.M) where n=\.. . . 

8: P<-PI 2™T ^Normalisation of the estimated probabilities 

return: P= [q(l\e,M),q(2\eM), ■ ■ ■ ,?(«,„„.vl»,A^)l 
doi:l 0.1 371/journal.pone.008231 7.t005 



Table 6. The number Q of cells containing n bacteria when 
virulent (SL5560) and attenuated (SL3261) strains of bacteria 
were used. 







n 






Virulent 


Attenuated 


1 


655 


1189 


2 


250 


396 


3 


87 


104 


4 


86 


70 


5 


54 


40 


6 


42 


25 


7 


13 


8 


8 


30 


10 


9 


8 


9 


10 


19 


3 


11 


5 


7 


12 


12 


4 


13 


5 


3 


14 


1 


4 


15 


6 


0 


16 


3 


2 


17 


2 


1 


18 


0 


2 


19 1 1 


20 


4 


0 


21 


0 


0 


22 


0 


0 


23 


0 


0 


24 


0 


1 


25 


1 


0 


26 


0 


0 


27 


0 


0 


28 


0 


0 


29 


1 


0 


doi:1 0.1 371/journal.pone.0082317.t006 



A/". The probability distributions considered for A/" are shown in 
Table 3. 

For the second group of models, the assumption is that, instead 
of pre-programmed burst thresholds N, there is burst rate ji that is 
a function of the number of bacteria « in a cell. For these models, 
the general relationship is 

M«;/^o./^i'/^2) = A'o + /'i« + /"2«^ (15) 

where |lQ,)i^,)i2£^,'X)). Furthermore, the rate of bacterial replica- 
tion a„ is assumed to be related to n by 

a„ = (Zo exp( — ac«) (16) 

where ao,aie[0,Go). As explained in Brown et al. [24], in the 
dynamic model, time can be re-scaled by the baseline replication 
rate (Xq, therefore this parameter cannot be estimated using the 
quasi-stationary distribution. For convenience, we set ao = 1 , so 



PLOS ONE I www.plosone.org 



6 



December 2013 | Volume 8 | Issue 12 | e82317 



Nested Sampling - Salmonella 




T3 
O 



14 


■ i 




1 1 




13 










12 










11 






i-n>i 




10 


■ 1 








9 










8 


■ 1 








7 


■ 1 








6 


■ 1 








5 






i--[rFi 




4 


- 1 








3 










2 


- 1 








1 


• 1 


1 1 


> 1 





0.00 



0.05 



0.10 



0.15 0.20 

PiMAV) 



0.25 



0.30 



0.35 



Figure 2. Estimates of tKie posterior model probabilities p{M\V) when using data from (A) the attenuated strain and (B) the virulent 
strain. 

doi:10.1371/joumal.pone.0082317.g002 



that the values of other parameters are relative to the baseline 
replication rate. The parameters of the eight stochastic models 
considered are shown in Table 4. 

Under the assumption that the number of host cells infected by 
n bacteria reaches a quasi-stationary distribution, the probability 
g(n\0,A4) that a cell contains n bacteria can be derived for the 14 



models [30]. For Model 1, we have the relationship 

/ 1/1 ^ ^ \ max 

'^^"''-^^= (.V,„..-lK. + l) - 

For Models 2 to 6, the relationship is 



(17) 
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Table 7. Median — logjo (Z) estimated for Models 1 to 6. 



Model 


Distribution 


Attenuated 


Virulent 


1 




77.59 


38.56 


2 




69.49 


92.79 


3 


d{Ar\e) = (W/Af !)exp( - 1) 


53.75 


34.09 


4 




245.87 


281.46 


5 




30.26 


34.18 


6 


d(Af\8)=p(l-pf-' 


84.26 


79.97 



The highest model evidence Z (bold) and second highest model evidence 

(italic) models are highlighted. 

doi:l 0.1 371/journal.pone.008231 7.t007 



1 " 



For Models 7 to 16, we have the recursive relationship 



(18) 



q{n\g,M)-- 



0!«-l(«-l) 



>' + 0£„« + ^((M;^(o,/.i,,/i2) 

where the infection rate constant y is given by 



q(n-\\0,M), (19) 



(20) 



The value for q(l | 0, Ai) can be handled as follows. Let 
o:„_i(«-l) 

Jn - 



(21) 



y + o:„« + /i(w;/.io,/i,,/i2)' 
so that (19) can be written as q{n\0,M) =f„q{n — l\0,M), then 

q(n\0,M) = q(M0,M) + q(2\0,M) + qOWM) +■■■ 

= q{\\0M)^hq{\\0M)^hhq{\\0M)^ ■ ■ ■ 
= q{\\0M)(\+f2+f2f^+---) 

but Y.'^^'^i q(n\0,M) = h therefore, 

?(l|0,A1) = (l+/2+/2/3+ ■••)"'• 



When bacterial replication is not dependent on n, = 0, in 
which case >' = 1, but when replication is density dependent, (19) 
and (20) need to be solved self-consistently. This can be done by 
assuming an initial value for y, computing q(n\0,M) from (19), 
updating y using (20), and repeating this iteratively until y no 
longer changes significantly. This process is shown in Algorithm ,3 
(Table 5). 



Table 8. Median 
7 to 14. 



logiQ {Z) estimated for stochastic Models 



Parameters, ff 



Model 


Ao 


Ai 


//2 






Attenuated 


Virulent 


7 


/'o 


0 


0 




0 


27.21 


38.56 


8 


0 


/'I 


0 




0 


28.00 


36.93 


9 


0 


0 






0 


38.80 


35.24 


10 


/lo 


/'! 






0 


29.21 


39.21 


11 


A'o 


0 


0 






27.32 


34.27 


12 


0 


/'I 


0 






30.13 


34.43 


13 


0 


0 


ft 






41.25 


34.60 


14 


A'o 


/'I 








30.04 


36.34 



The highest model evidence Z (bold) and second highest model evidence 

(italic) models are highlighted. 

doi:1 0.1 371/journal.pone.008231 7.t008 



Likelihood function 

With expressions for q(n\0,M) established for all the models, we 
can now determine the likelihood L(0) required for Algorithm 2. 
Following Brown et al. [30], we can express the likelihood function 
by a multinomial distribution: 



C{0)=p(V\0,M) 



--p{{Cn}\{q(n\0M)}) 



^^"':^^ q{n\0,Mf" 

n = 1 C„\ 



(22) 



(23) 



(24) 



where {C„} is the observed distribution of C„ (the number of cells 
with n bacteria), and Y'= ^„ C„, if observations are assumed to be 



Table 9. — logm {Z) estimates for all models. 



Model 


Attenuated 




Virulent 






min 


median 


max 


min 


median 


max 


1 


77.56 


77.59 


77.63 


38.55 


38.56 


38.58 


2 


69.38 


69.49 


69.66 


92.66 


92.79 


92.88 


3 


53.71 


53.75 


53.79 


34.07 


34.09 


34.10 


4 


245.83 


245.87 


245.91 


281.36 


281.46 


281.50 


5 


29.93 


30.26 


30.52 


34.16 


34.18 


34.24 


6 


84.23 


84.26 


84.30 


79.93 


79.97 


80.01 


7 


27.19 


27.21 


27.24 


38.52 


38.56 


38.58 


8 


27.94 


28.00 


28.02 


36.88 


36.93 


36.97 


9 


38.78 


38.80 


38.85 


35.20 


35.24 


35.98 


10 


29.06 


29.21 


29.39 


38.66 


39.21 


43.12 


11 


27.28 


27.32 


27.38 


34.24 


34.27 


34.28 


12 


29.99 


30.13 


30.36 


34.39 


34.43 


34.50 


13 


40.93 


41.25 


41.84 


34.53 


34.60 


34.63 


14 


29.86 


30.04 


30.48 


36.19 


36.34 


39.65 



doi:1 0.1 371/journal.pone.008231 7.t009 
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Figure 3. An estimate of the marginal probability distribution ;7(/ioP,Model7). V is data from the attenuated strain. 
doi:1 0.1 371 /journal.pone.008231 7.g003 



independent. Garca-Perez [31] provides an algorithm for the 
accurate computation of multinomial probabilities. 

As regards the prior Ti(0) for a model M, it will be assumed to 
be uniform across the parameter space of interest for that model; 
consequently, the prior wiU be set equal to the reciprocal of the 
size of the parameter space. More precisely, 



71(0) = 71(01,..., 0b) = 



n max(0,)- 
/— 1 



■niin(0,) 



A continuation approach 

The theory underlying nested sampling assumes that all the 
parameters for a model have continuous values, however, this wUl 
not necessarily be the case in practice. For example, the binomial 
model (Model 3) has a discrete parameter n and a continuous 
parameter p. 

It is possible to formulate a theory of nested sampling for 
discrete parameters by replacing integrals with summations, but 
modifications to Algorithm 2 would be required to take account of 
the fact that, if 0 is discrete, several points could occupy the same 
location in parameter space. 

An alternative response to the presence of discrete parameters is 
to use a type ot continuation approach [32]; in other words, if/(-x:) is a 
function defined only for integer values of x, replace it with 
another function g(x) that takes real values, but for which 
g{x)=f(x) when neN (or No). 

For Model 2, the Kronecker delta dj^/j^. can be replaced with a 
narrow Gaussian function exp( — (A/"— A/",)^) with A/';e[l,oo). In 
the case of Model 1 , continuation can be applied directly to ( 1 7) by 
allowing J\f maxS[l ,<Xi) . 



For those models using a factorial of a parameter (i.e.. Models 4 
and 5), we can replace x! with T(x+ 1) since F is a function of a 
real value. 

The data 

The data V consisted of the number C„ of mice cells observed 
(via fluorescence microscopy) to contain n S. enterica bacteria: 

V={C„}^^l. One dataset was used for a virulent bacterial strain 
(SL5560); another for an attenuated strain (SL3261). The infected 
cells were taken randomly from various locations in the liver. The 
observed C„ values are shown in Table 6. 

The data was pooled. If Ci'' denotes the number of cells having 
n bacteria on day t then, for the virulent strain, Brown et al. [24] 

used C„ = Ci^' + Cl'*' , and for the attenuated strain they used 



AW] Ai2] 



Posterior model probabilities 

If we assume that the set of candidate models is exhaustive, we 
can apply (2) to estimate the posterior probability piAdlV) for each 
model. Furthermore, if p(A4,) is assumed to be equal for all 
models, we can use 



p(Mi\V)-- 



7 = 1 



(25) 



There are 1 4 models, each arbitrarily having 1 0 estimates of Z, 
but it is impractical to systematically apply each of the lO'* 
possible combinations of .2 to [25]; therefore, the Z values were 
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0.00 0.02 0.04 0-06 0.08 0.10 0.12 

Parameter, 

Figure 4. Estimates of tKie marginal probability distributions />(/.ioP,Modelll) and p(ae\'D, Modell\). V is data from the attenuated strain. 
doi:10.1371/journal.pone.0082317.g004 



chosen randomly in order to obtain distributions for p(M\Ty). The 
resulting distributions are shown in Figure 2. 

An alternative approach to Bayesian model comparison is to use 
the Bayes factor p(V\Mi)/p{D\M.j). This provides a relative 
comparison of models and JVlj but not the absolute values of 
their posterior probabilities p{M\D). 



Results 

The estimated model-evidence values Z obtained by nested 
sampling for each model is shown in Tables 7 and 8. The ranges 
are shown in Table 9. 

With respect to the data from the attenuated strain, the most 
probable model was Model 7 (^(q only) followed by Model 1 1 (/Xg 
and flf). With respect to the data from the virulent strain, the most 
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30 



Figure 5. An estimate of tlie marginal probability distribution /7(2|X),Model3). V is data from the virulent strain. 
doi:10.1371/journal.pone.0082317.g005 



probable model was Model 3 (Poisson) followed by Model 5 
(negative binomial). 

Parameter distributions 

After having estimated the most probable model, it is of 

interest to estimate the posterior joint probability of the 
parameters 0 with respect to V and A^*: p(0\T>,Ai*). 

From Bayes' theorem, we can write 



p{0\VM*)-- 



C(0\M*)n(g) 



(26) 



and the denominator of Eqn (26) can be estimated by nested 
sampling: 



p{0\VM*)-- 



C(0\M*)n{0) 
Z 



(27) 



Parameter estimation directly from nested sampling 

The parameter sequence {0miii,l^0iiim,2, ■ ■ ■ ,0mm,j} is produced 
during nested sampling. Can this set of parameters be regarded as 
a random sample from p{0\'D,A4)? Sivia and SkUling [26] 
proposed using {0min,k}i=i for this purpose so long as it is 
weighted by n'f: = AZi,/Z, where AZk = l„u„^ki^k_i - i/,), on die 
basis that WkKp(0mm,k\'D,M)- A theoretical justification for this is 
given by Chopin and Robert [29]. 

The appropriateness of regarding {0mm,k}k=i ^ random 
sample from p(0\'D,M), was ascertained empirically using the 
Kolomogorov-Smirnov test, as follows. 

The Kolmogorov-Smirnov statistic D„ is given by 

D„=s\ip\(x)-Fo(x)\, 

xeR 

where Fo{x) is the cdf of the null-hypothesis pdf, and F(x) is the 
empirical cdf obtained from a sample {X, }: 



Parameter estimation via reject sampling 

Distribution p{0\V,A4) can be estimated using reject samphng 
with approximation (27). As part of this process, the maximum of 
p{0\'D,Ai) can be determined by performing Nelder-Mead 
simplex optimisation with respect to this distribution over 
parameter space. 

The estimated parameter distributions obtained by reject 
sampling for Models 3, 5, 7 and 11, are shown in Figures 3, 4, 
5, and 6, respectively. In each case, the sample size n was 10000. 
The samples obtained by reject sampling were also used to 
construct density scatter plots (Figures 7 and 8), which provide a 
visualisation of the correlations between the parameters. 



F(x)=-Y,H^.<x}- (28) 
" i=i 

This definition can be generalized to a weighted Kolmogorov- 
Smirnov statistic by replacing (28) with a weighted cdf 

n 

Fix)=Y^wi\{Xi<x}. 

i=i 

This allows us to take account of the weights {u'/tj^^j on 

{0min,k} = 1 - 

Applying this method to the toy model Miov presented in 
Methods SI, a sample {0mm,k}i=i, with /= 10586, was obtained 
by performing nested sampling for the evaluation of evidence 
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Parameter, p 



Figure 6. Estimates of the marginal probability distributions p(r\'D,Model5) and ;>(/i|X),Model5). V is data from the virulent strain. 
doi:10.1371/journal.pone.0082317.g006 



p(V\Aitoy), where 0 = [oi,fl]T. The corresponding sample 
{^mm,k}k=i was compared with the marginal beta distribution, 

using the weighted Kolmogorov-Smirnov statistic Dj. This statistic 
was equal to 0.01298. In order to obtain a frequentist p-vahxe for 



the statistic, an empirical probability distribution for Dj was 
obtained by randomly selecting a set {ak}i=] of a values from 
p{cx.\-,Aitoy) and determining Z)y for the set, this being done 10000 
times. On comparing 0.01298 with this empirical distribution, the 
p-vahie for {oiimin,k}k=\ was found to be 0.0276. In contrast, when 
a sample of size / was obtained by reject sampling from 
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0.08 



0.10 



Figure 7. Density scatter plot of the estimated joint probability distribution />(^io,ctt>|X',Modelll). V is data from the attenuated strain. 
doi:1 0.1 371 /journal.pone.008231 7.g007 



p{ix\-,Mioy), the value of unweighted Dj was 0.00630, which has a 
p-v3lue of 0.5772. 

As a result of tliis experiment, it was decided not to use 
{0mm,k}i=i for estimating parameter distributions. 



Model checking 

It does not follow that the most probable model from a set of 
candidate models is necessarily an acceptable model: the most 
probable model may be the least worst of a set of poor models. 
What is required is an assessment of the fit of the most probable 
models to the observed data. 



120 



100 



4-1 

<D 

£ 
03 

03 
Q- 




0.4 0.5 0.6 0.7 

Parameter p 



Figure 8. Density scatter plot of the estimated joint probability distribution p{r,p\V, Models). V is data from the virulent strain. 
doi:1 0.1 371 /journal.pone.008231 7.g008 
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Figure 9. The observed number of cells with n bacteria (blue) compared with 95% credibility intervals (red) predicted by Model 3 
with respect to the virulent strain. 

doi:10.1371/journal.pone.0082317.g009 



A common approach to assessing the fit of a model to data is to 
use a ^-value with respect to some statistic T(y), where y is 
observed data. More formally, the classical ^-value is given by 



p(T(y ,0)>T(y,0)\OM), 



(29) 



where y is a possible future value, and the probability is taken 
over the distribution of y given 0, a single parameter estimate. 

A drawback of (29) is that it does not take account of the 
uncertainty in 0 expressed by the posterior distribution p(0\y,Ai). 
In contrast, the BsLyesian posterior predictive p-value [33,34] 
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Figure 10. The observed number of cells with n bacteria (blue) compared with 95% credibility intervals (red) predicted by Model 5 
with respect to the virulent strain. 

doi:1 0.1 371 /joumal.pone.008231 7.g01 0 
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Figure 1 1 . The observed number of cells with n bacteria (blue) compared with 95% credibility intervals (red) predicted by Model 7 
with respect to the attenuated strain. 

doi:10.1 371/joumal.pone.008231 7.g01 1 



p(T{y)>T(y)\y,M), (30) 
overcomes the problem by using the posterior predictive distribution: 



piy \e,M)p(0\y,M) de. 



piy \yM)-- 



piy ,0\y,M) d0 



The posterior distribution can be simulated by drawing m 
values 6 from p(0\y,M), and then, for each 0, sampling a y from 



1400 
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30 



Figure 12. The observed number of cells with n bacteria (blue) compared with 95% credibility intervals (red) predicted by Model 1 1 
with respect to the attenuated strain. 

doi:1 0.1 371 /joumal.pone.008231 7.g01 2 
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piy \0,M). The resulting m values of y represent draws from 
p{y\y,M)- 

In the context of the Salmonella study, 0 was provided by the 
parameter estimates obtained for p{0\'D,M), m was set to 10000, 
and p(y \0,M) was modelled as a multinomial distribution 

Pi{l}Zmny^M)fl.)=r^. ^^^4^, (31) 

where Y is the total number of counts (cf (22)). 

In order to obtain m values of T{y) drawn from p(y |y,A^), 
each y' drawn from p(y \0,M) is mapped to T{y ,0). 

We used the G-statistic for the test statistic T [35], The G- 
statistic is proportional to the Kullback-Leibler measure of 
distribution divergence, and is given by 

G(.,>l) = 2gO„ln(^), (32) 

where 0„ = C„, and E„(0,M) is the expected value for j^: 

E„{0,M) = yq(n\0,M). 

Applying the above approach for estimating the distribution of 
G under a given model A4, the posterior predictive /i-values for 
{C„}f=i were found to be 0.005 for Model 7 and 0.006 for Model 
1 1 (with respect to the attenuated strain), < 10"'* for Model 3 and 
< 10"'* for Model 5 (with respect to the virulent strain). This 
suggests a poor fit of the models to the data. 

A visual representation of the fit of data to a model A4 can be 
provided by comparing the observed count C„ (the number of cells 
containing n bacteria) to the distribution of m possible count values 
y„ obtained via (31). This visualisation is shown in Figures 9, 10, 
11 and 12. 

Discussion 

The AIC is a common maximum-likelihood approach to model 
comparison, but nested sampling enables a Bayesian approxima- 
tion of model evidence p{T>\M) to be computed, along with the 
advantages of adopting the Bayesian approach. These include 
integration across parameters; estimation of the posterior param- 
eter distributions (with visualisation of parameter correlations); and 
estimation of the posterior predictive distributions for goodness-of- 
fit assessments of the models. 

Under the assumptions used, the most probable models with 
respect to the virulent and attenuated strains of S. enkrica were 
burst-threshold Model 3 (Poisson) and burst-rate Model 7 (jUq 
only), respectively. The next two most probable models were 
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